Deep Neural Networks for Time Series

${\bf \large \cal Hovhannes \ Grigoryan}\\ {\rm \small New \ York, \ NY}$

Abstract

We perform time series forecasting using deep neural networks with RNNs and CNNs. While RNNs only account for the past data, CNNs also looks ahead, and can find local patterns. In CNN each filter learns a rule that applies to the different portions of the time series data (like piecewise linear regression). In addition, CNNs are computationally more efficient, since there are fewer sequential calculations.


Content

  • Trend and seasonality
  • Autocorrelated time series
  • Forecasting synthetic data
  • Machine learning on time windows
  • Forecasting with Time Windows
  • Recurrent Neural Networks for time series
  • Using LSTM and Lambda Layers
  • LSTM with CNN
  • Predicting Sunspots data
  • Appendix: ARMA on Sunspots data

References

[1] L. Moroney, Sequences, Time Series and Prediction, deeplearning.ai

In [1]:
%%html

<style>
body, p, div.rendered_html {
    color: black;
    font-family: "Times Roman", sans-serif;
    font-size: 12pt;
}
</style>
In [2]:
import sys, os
import warnings

sys.path.append(os.path.abspath(os.path.join('..')))
# warnings.filterwarnings('ignore')

import time
from datetime import datetime
import random
import numpy as np
import pandas as pd
from collections import defaultdict, deque
import cv2

from sklearn.linear_model import LinearRegression
from sklearn import metrics, preprocessing
from sklearn.metrics import mean_squared_error

import gym
import gym.wrappers
from gym.envs.registration import register
from gym.core import ObservationWrapper, Wrapper
from gym.spaces import Box
from gym.spaces.box import Box

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, InputLayer
from tensorflow.keras.optimizers import Adam

tf.enable_eager_execution()

from IPython.display import clear_output
import matplotlib.pyplot as plt
from matplotlib import animation, rc, cm

from pprint import pprint

from tqdm import tqdm_notebook as tqdm
from tqdm import trange

from utils.make_media import *
from utils.html_converter import html_converter

from IPython.display import HTML, Image, clear_output

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

plt.rcParams['figure.figsize'] = (14,8)
plt.style.use('fivethirtyeight')
rc('animation', html='html5')

root_logdir = "tf_logs"

# Where to save the figures
PROJECT_ROOT_DIR = "."
In [3]:
from tensorflow.python.platform import build_info as tf_build_info
from tensorflow.python.client import device_lib

print("Tensorflow Version:", tf.__version__)
print("GPU available:", tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None))

if tf.test.gpu_device_name():
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
    print("Please install GPU version of TF")

print("List of GPU devices", tf.config.experimental.list_physical_devices('GPU'))
Tensorflow Version: 1.15.0
GPU available: True
Default GPU Device: /device:GPU:0
List of GPU devices [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
In [4]:
def reset_graph(seed=42):
    """ To make this notebook's output stable across runs"""
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

def save_fig(fig_id, tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR, "img", fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format='png', dpi=300)

Motivation

We often need to perform time series analysis to solve the following problems:

  • Imputation
  • Anomaly Detection
  • Forecasting

Common patterns in time series

  • Trend
  • Seasonality
  • Autocorrelation
  • Noise

Training time series

  • Fixed partitioning
    • Divide time series into training, validation and test sets.
    • If there is a seasonality these periods should contain full seasons.
  • Roll-forward partitioning
    • When training, train on 0-T data, predict T+1, then train on 0-T+1, predict T+2, etc

Metrics for evaluating performance

    errors = forecast - actual
    mse = np.square(errors).mean()
    rmse = np.sqrt(mse)
    mae = np.abs(errors).mean()
    mape = np.abs(errors / x_valid).mean()
In [5]:
def plot_series(time, series, format="-", start=0, end=None, label=None):
    plt.plot(time[start:end], series[start:end], format, label=label, lw=0.7)
    plt.xlabel("Time")
    plt.ylabel("Value")
    if label:
        plt.legend(fontsize=14)
    plt.grid(True)

Trend and Seasonality

Let's create a time series that just trends upward:

In [6]:
def trend(time, slope=0):
    return slope * time
In [7]:
time = np.arange(4 * 365 + 1)
baseline = 10
series = trend(time, 0.1)

plt.figure(figsize=(10, 6))
plot_series(time, series)
plt.show()

Now let's generate a time series with a seasonal pattern:

In [8]:
def seasonal_pattern(season_time):
    """Just an arbitrary pattern."""
    return np.where(season_time < 0.4, np.cos(season_time * 2 * np.pi), 1 / np.exp(3 * season_time))

def seasonality(time, period, amplitude=1, phase=0):
    """Repeats the same pattern at each period."""
    season_time = ((time + phase) % period) / period
    return amplitude * seasonal_pattern(season_time)
In [9]:
amplitude = 40
series = seasonality(time, period=365, amplitude=amplitude)

plt.figure(figsize=(10, 6))
plot_series(time, series)
plt.show()

Now let's create a time series with both trend and seasonality:

In [10]:
baseline = 10
slope = 0.05
series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)

plt.figure(figsize=(10, 6))
plot_series(time, series)
plt.show()

In practice few real-life time series have such a smooth signal. They usually have some noise, and the signal-to-noise ratio can sometimes be very low. Let's generate some white noise:

In [11]:
def white_noise(time, noise_level=1, seed=None):
    rnd = np.random.RandomState(seed)
    return rnd.randn(len(time)) * noise_level
In [12]:
noise_level = 5
noise = white_noise(time, noise_level, seed=42)

plt.figure(figsize=(10, 6))
plot_series(time, noise)
plt.show()

Now let's add this white noise to the time series:

In [13]:
series += noise

plt.figure(figsize=(10, 6))
plot_series(time, series)
plt.show()

All right, this looks realistic enough for now.

Autocorrelated time series

Let's generate an autocorrelated time series:

In [14]:
def autocorrelation1(time, amplitude, seed=None):
    rnd = np.random.RandomState(seed)
    φ1 = 0.5
    φ2 = -0.1
    ar = rnd.randn(len(time) + 50)
    ar[:50] = 100
    for step in range(50, len(time) + 50):
        ar[step] += φ1 * ar[step - 50]
        ar[step] += φ2 * ar[step - 33]
    return ar[50:] * amplitude
In [15]:
def autocorrelation2(time, amplitude, seed=None):
    rnd = np.random.RandomState(seed)
    φ = 0.8
    ar = rnd.randn(len(time) + 1)
    for step in range(1, len(time) + 1):
        ar[step] += φ * ar[step - 1]
    return ar[1:] * amplitude
In [16]:
series = autocorrelation1(time, 10, seed=42)
plot_series(time[:200], series[:200])
plt.show()
In [17]:
series = autocorrelation2(time, 10, seed=42)
plot_series(time[:200], series[:200])
plt.show()
In [18]:
series = autocorrelation2(time, 10, seed=42) + trend(time, 2)
plot_series(time[:200], series[:200])
plt.show()
In [19]:
series = autocorrelation2(time, 10, seed=42) + seasonality(time, period=50, amplitude=150) + trend(time, 2)
plot_series(time[:200], series[:200])
plt.show()

Regime change

In [20]:
series = autocorrelation2(time, 10, seed=42) + seasonality(time, period=50, amplitude=150) + trend(time, 2)
series2 = autocorrelation2(time, 5, seed=42) + seasonality(time, period=50, amplitude=2) + trend(time, -1) + 550
series[200:] = series2[200:]

plot_series(time[:300], series[:300])
plt.show()

Adding impulses

In [21]:
def impulses(time, num_impulses, amplitude=1, seed=None):
    rnd = np.random.RandomState(seed)
    impulse_indices = rnd.randint(len(time), size=10)
    series = np.zeros(len(time))
    for index in impulse_indices:
        series[index] += rnd.rand() * amplitude
    return series
In [22]:
series = impulses(time, 10, seed=42)
plot_series(time, series)
plt.show()
In [23]:
def autocorrelation(source, φs):
    ar = source.copy()
    max_lag = len(φs)
    for step, value in enumerate(source):
        for lag, φ in φs.items():
            if step - lag > 0:
                ar[step] += φ * ar[step - lag]
    return ar
In [24]:
signal = impulses(time, 10, seed=42)
series = autocorrelation(signal, {1: 0.99})
plot_series(time, series)
plt.plot(time, signal, "k-")
plt.show()
In [25]:
signal = impulses(time, 10, seed=42)
series = autocorrelation(signal, {1: 0.70, 50: 0.2})
plot_series(time, series)
plt.plot(time, signal, "k-")
plt.show()
In [26]:
series_diff1 = series[1:] - series[:-1]
plot_series(time[1:], series_diff1)
In [27]:
from pandas.plotting import autocorrelation_plot

autocorrelation_plot(series)
plt.grid()
In [28]:
from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(series, order=(5, 1, 0))
model_fit = model.fit(disp=0)
print(model_fit.summary())
                             ARIMA Model Results
==============================================================================
Dep. Variable:                    D.y   No. Observations:                 1460
Model:                 ARIMA(5, 1, 0)   Log Likelihood                2223.428
Method:                       css-mle   S.D. of innovations              0.053
Date:                Thu, 07 Nov 2019   AIC                          -4432.855
Time:                        16:31:24   BIC                          -4395.852
Sample:                             1   HQIC                         -4419.052

==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0003      0.001      0.384      0.701      -0.001       0.002
ar.L1.D.y     -0.1235      0.026     -4.714      0.000      -0.175      -0.072
ar.L2.D.y     -0.1254      0.029     -4.333      0.000      -0.182      -0.069
ar.L3.D.y     -0.1089      0.029     -3.759      0.000      -0.166      -0.052
ar.L4.D.y     -0.0914      0.029     -3.162      0.002      -0.148      -0.035
ar.L5.D.y     -0.0774      0.029     -2.675      0.008      -0.134      -0.021
                                    Roots
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.0145           -1.1311j            1.5194           -0.1336
AR.2            1.0145           +1.1311j            1.5194            0.1336
AR.3           -1.8173           -0.0000j            1.8173           -0.5000
AR.4           -0.6967           -1.6113j            1.7554           -0.3150
AR.5           -0.6967           +1.6113j            1.7554            0.3150
-----------------------------------------------------------------------------

Forecasting synthetic data

Our synthetic data will be contain: trend + seasonality + noise

In [29]:
time = np.arange(4 * 365 + 1, dtype="float32")
baseline = 10
series = trend(time, 0.1)
baseline = 10
amplitude = 40
slope = 0.05
noise_level = 5

# Create the series
series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)
# Update with noise
series += white_noise(time, noise_level, seed=42)

plt.figure(figsize=(10, 6))
plot_series(time, series)
plt.show()

Now that we have the time series, let's split it so we can start forecasting

In [30]:
split_time = 1000

time_train = time[:split_time]
time_valid = time[split_time:]

x_train = series[:split_time]
x_valid = series[split_time:]

plt.figure(figsize=(10, 6))
plot_series(time_train, x_train)
plt.show()

plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plt.show()

Naive forecast

Assume that the next value is the same as the last value.

In [31]:
naive_forecast = series[split_time - 1:-1]
In [32]:
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, naive_forecast)

Let's zoom in on the start of the validation period:

In [33]:
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid, start=0, end=150)
plot_series(time_valid, naive_forecast, start=1, end=151)

As designed, naive forecast lags 1 step behind the original time series.

Now let's compute the mean squared error and the mean absolute error between the forecasts and the predictions in the validation period:

In [34]:
print(keras.metrics.mean_squared_error(x_valid, naive_forecast).numpy())
print(keras.metrics.mean_absolute_error(x_valid, naive_forecast).numpy())
61.827538
5.9379086

That's our baseline!

Moving average forecast

In [35]:
def moving_average_forecast(series, window_size):
    """Forecasts the mean of the last few values.
     If window_size=1, then this is equivalent to naive forecast"""
    forecast = []
    for time in range(len(series) - window_size):
        forecast.append(series[time:time + window_size].mean())
    return np.array(forecast)
In [36]:
moving_avg = moving_average_forecast(series, 30)[split_time - 30:]

plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, moving_avg)
In [37]:
print(keras.metrics.mean_squared_error(x_valid, moving_avg).numpy())
print(keras.metrics.mean_absolute_error(x_valid, moving_avg).numpy())
106.674576
7.142419

That's worse than naive forecast! The moving average does not anticipate trend or seasonality, so let's try to remove them by using differencing. Since the seasonality period is 365 days, we will subtract the value at time t - 365 from the value at time t.

Moving average on differenced time series

In [38]:
diff_series = (series[365:] - series[:-365])
diff_time = time[365:]

plt.figure(figsize=(10, 6))
plot_series(diff_time, diff_series)
plt.show()

Great, the trend and seasonality seem to be gone, so now we can use the moving average:

In [39]:
diff_moving_avg = moving_average_forecast(diff_series, 50)[split_time - 365 - 50:]

plt.figure(figsize=(10, 6))
plot_series(time_valid, diff_series[split_time - 365:])
plot_series(time_valid, diff_moving_avg)
plt.show()

Now let's bring back the trend and seasonality by adding the past values from t - 365:

In [40]:
diff_moving_avg_plus_past = series[split_time - 365:-365] + diff_moving_avg

plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, diff_moving_avg_plus_past)
plt.show()
In [41]:
print(keras.metrics.mean_squared_error(x_valid, diff_moving_avg_plus_past).numpy())
print(keras.metrics.mean_absolute_error(x_valid, diff_moving_avg_plus_past).numpy())
52.973656
5.8393106

Better than naive forecast, good. However the forecasts look a bit too random, because we're just adding past values, which were noisy. Let's use a moving averaging on past values to remove some of the noise:

Smoothing Both Past and Present Values

Forecasts = trailing MA of diff series + centered MA of past series (t-365)

In [42]:
diff_moving_avg_plus_smooth_past = moving_average_forecast(series[split_time - 370:-360], 10) + diff_moving_avg

plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, diff_moving_avg_plus_smooth_past)
plt.show()
In [43]:
print(keras.metrics.mean_squared_error(x_valid, diff_moving_avg_plus_smooth_past).numpy())
print(keras.metrics.mean_absolute_error(x_valid, diff_moving_avg_plus_smooth_past).numpy())
33.45226
4.569442

Machine learning on time windows

Preparing features and labels

In [44]:
dataset = tf.data.Dataset.range(10)

for val in dataset:
    print(val.numpy(), end=" ")
0 1 2 3 4 5 6 7 8 9 

Using tf.data.Dataset.range(T).window(k)

    tf.data.Dataset.range(7).window(2) produces { {0, 1}, {2, 3}, {4, 5}, {6}}
    tf.data.Dataset.range(7).window(3, 2, 1, True) produces { {0, 1, 2}, {2, 3, 4}, {4, 5, 6}}
    tf.data.Dataset.range(7).window(3, 1, 2, True) produces { {0, 2, 4}, {1, 3, 5}, {2, 4, 6}}
In [45]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1)

for window_dataset in dataset:
    for val in window_dataset:
        print(val.numpy(), end=" ")
    print()
0 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9
7 8 9
8 9
9

This cuts the last 4 rows with less than 5 elements

In [46]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
for window_dataset in dataset:
    for val in window_dataset:
        print(val.numpy(), end=" ")
    print()
0 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9

This converts windows of data into numpy arrays

In [47]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(5))

for window in dataset:
    print(window.numpy())
[0 1 2 3 4]
[1 2 3 4 5]
[2 3 4 5 6]
[3 4 5 6 7]
[4 5 6 7 8]
[5 6 7 8 9]

Splitting the data into features (lagged time series data) and labels (single step forward data)

In [48]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:])) # this is where the splitting is done

for x,y in dataset:
    print(x.numpy(), y.numpy())
[0 1 2 3] [4]
[1 2 3 4] [5]
[2 3 4 5] [6]
[3 4 5 6] [7]
[4 5 6 7] [8]
[5 6 7 8] [9]

This shuffles the windows (feature, label). Here buffer_size is the total number of time series data. This is necessary to avoid sequence bias.

In [49]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
dataset = dataset.shuffle(buffer_size=10)

for x,y in dataset:
    print(x.numpy(), y.numpy())
[3 4 5 6] [7]
[0 1 2 3] [4]
[2 3 4 5] [6]
[4 5 6 7] [8]
[1 2 3 4] [5]
[5 6 7 8] [9]

This will batch the data into the sets of size batch_size

In [50]:
batch_size = 2

dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
dataset = dataset.shuffle(buffer_size=10)
dataset = dataset.batch(batch_size).prefetch(1)

for x,y in dataset:
    print("x = ", x.numpy())
    print("y = ", y.numpy())
x =  [[2 3 4 5]
 [1 2 3 4]]
y =  [[6]
 [5]]
x =  [[4 5 6 7]
 [0 1 2 3]]
y =  [[8]
 [4]]
x =  [[3 4 5 6]
 [5 6 7 8]]
y =  [[7]
 [9]]

Forecasting with Time Windows

In [51]:
time = np.arange(4 * 365 + 1, dtype="float32")

baseline = 10
amplitude = 40
slope = 0.05
noise_level = 5

# Create the series
series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)

# Update with noise
series += white_noise(time, noise_level, seed=42)

split_time = 1000
time_train = time[:split_time]
x_train = series[:split_time]
time_valid = time[split_time:]
x_valid = series[split_time:]

window_size = 20
batch_size = 32
shuffle_buffer_size = 1000
In [52]:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    dataset = tf.data.Dataset.from_tensor_slices(series)
    dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
    dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
    dataset = dataset.shuffle(shuffle_buffer).map(lambda window: (window[:-1], window[-1]))
    dataset = dataset.batch(batch_size).prefetch(1)
    return dataset
In [53]:
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)

for x, y in dataset:
    print("x = ", x.numpy().shape)
    print("y = ", y.numpy().shape)
    break
x =  (32, 20)
y =  (32,)

Linear regression with 1-Dense Layer

In [54]:
reset_graph()

l0 = tf.keras.layers.Dense(1, input_shape=[window_size])
model = tf.keras.models.Sequential([l0])

model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-6, momentum=0.9))
model.fit(dataset, epochs=100, verbose=0)

print("Layer weights {}".format(l0.get_weights()))
Layer weights [array([[-0.01988217],
       [ 0.00494236],
       [ 0.00656087],
       [ 0.01940165],
       [-0.08564734],
       [ 0.04886799],
       [ 0.01337459],
       [ 0.0846842 ],
       [-0.087626  ],
       [ 0.00209282],
       [-0.02677982],
       [ 0.08195151],
       [-0.12671997],
       [ 0.05518785],
       [ 0.01961388],
       [ 0.15396477],
       [-0.08422583],
       [ 0.18567665],
       [ 0.31259775],
       [ 0.41760445]], dtype=float32), array([0.01689069], dtype=float32)]
In [55]:
forecast = []

for time in range(len(series) - window_size):
    forecast.append(model.predict(series[time:time + window_size][np.newaxis]))

forecast = forecast[split_time-window_size:]
results = np.array(forecast)[:, 0, 0]


plt.figure(figsize=(10, 6))

plot_series(time_valid, x_valid)
plot_series(time_valid, results)
In [56]:
tf.keras.metrics.mean_absolute_error(x_valid, results).numpy()
Out[56]:
5.256559

DNN: 3-Dense Layers

In [57]:
reset_graph()

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"),
    tf.keras.layers.Dense(10, activation="relu"),
    tf.keras.layers.Dense(1)
])

model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-6, momentum=0.9))
model.fit(dataset, epochs=100, verbose=0)
Out[57]:
<tensorflow.python.keras.callbacks.History at 0x1a227d948d0>
In [58]:
forecast = []
for time in range(len(series) - window_size):
    forecast.append(model.predict(series[time:time + window_size][np.newaxis]))

forecast = forecast[split_time-window_size:]
results = np.array(forecast)[:, 0, 0]


plt.figure(figsize=(10, 6))

plot_series(time_valid, x_valid)
plot_series(time_valid, results)
In [59]:
tf.keras.metrics.mean_absolute_error(x_valid, results).numpy()
Out[59]:
5.072883

Searching the best learning rate

In [60]:
reset_graph()

dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"),
    tf.keras.layers.Dense(10, activation="relu"),
    tf.keras.layers.Dense(1)
])

lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-8 * 10 ** (epoch / 20))

optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss="mse", optimizer=optimizer)
history = model.fit(dataset, epochs=100, callbacks=[lr_schedule], verbose=0)
WARNING:tensorflow:From c:\users\hovhannes\documents\python scripts\statmechalgos\venv\lib\site-packages\tensorflow_core\python\data\util\random_seed.py:58: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
In [61]:
lrs = 1e-8 * (10 ** (np.arange(100) / 20))
plt.semilogx(lrs, history.history["loss"])
plt.axis([1e-8, 1e-3, 0, 300])
Out[61]:
[1e-08, 0.001, 0, 300]

Increasing window size

In [62]:
reset_graph()

window_size = 30
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(10, activation="relu", input_shape=[window_size]),
  tf.keras.layers.Dense(10, activation="relu"),
  tf.keras.layers.Dense(1)
])

lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-8 * 10 ** (epoch / 20))

optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss="mse", optimizer=optimizer)
history = model.fit(dataset, epochs=500, callbacks=[lr_schedule], verbose=0)
In [63]:
lrs = 1e-8 * (10 ** (np.arange(500) / 20))
plt.semilogx(lrs, history.history["loss"])
plt.axis([1e-8, 1e-3, 0, 300]);
In [64]:
idx = np.where(history.history["loss"] == min(history.history["loss"]))
best_lr = lrs[idx][0]
best_lr
Out[64]:
2.5118864315095798e-05

Using the best learning rate

In [65]:
reset_graph()

model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(10, activation="relu", input_shape=[window_size]),
  tf.keras.layers.Dense(10, activation="relu"),
  tf.keras.layers.Dense(1)
])

optimizer = tf.keras.optimizers.SGD(lr=1e-5, momentum=0.9)
model.compile(loss="mse", optimizer=optimizer)
history = model.fit(dataset, epochs=500, verbose=0)

Plot all but the first 10

In [66]:
loss = history.history['loss']
epochs = range(10, len(loss))
plot_loss = loss[10:]
plt.plot(epochs, plot_loss, 'b', label='Training Loss')
plt.show()
In [67]:
forecast = []
for time in range(len(series) - window_size):
    forecast.append(model.predict(series[time:time + window_size][np.newaxis]))

forecast = forecast[split_time-window_size:]
results = np.array(forecast)[:, 0, 0]


plt.figure(figsize=(10, 6))

plot_series(time_valid, x_valid)
plot_series(time_valid, results)
In [68]:
print(keras.metrics.mean_squared_error(x_valid, results).numpy())
print(keras.metrics.mean_absolute_error(x_valid, results).numpy())
43.174564
4.8658423

Recurrent Neural Networks for Time Series

Example of architecture:

Input windows $x$ $\to$ Recurrent Layer $\to$ Recurrent Layer $\to$ Dense Layer $\to$ Forecast $\hat{y}$

  • dims=1 for univariate time series, and dims=n for $n$-variate time series.

  • x.shape = [batch_size, time_steps, dims]

  • y.shape = [batch_size, # units]

  • Default keras behavior is Sequence-to-Vector

  • To have Sequence-to-Sequence behavior, choose return_sequence=True

In [69]:
reset_graph()
tf.keras.backend.clear_session()
np.random.seed(51)

train_set = windowed_dataset(x_train, window_size, batch_size=128, shuffle_buffer=shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]),
  tf.keras.layers.SimpleRNN(40, return_sequences=True),
  tf.keras.layers.SimpleRNN(40),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 100.0)
])

lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-8 * 10**(epoch / 20))
optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)

model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(train_set, epochs=100, callbacks=[lr_schedule])
In [70]:
plt.semilogx(history.history["lr"], history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 30])
Out[70]:
[1e-08, 0.0001, 0, 30]
In [71]:
reset_graph()
tf.keras.backend.clear_session()
np.random.seed(51)

dataset = windowed_dataset(x_train, window_size, batch_size=128, shuffle_buffer=shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]),
  tf.keras.layers.SimpleRNN(40, return_sequences=True),
  tf.keras.layers.SimpleRNN(40),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 100.0)
])

optimizer = tf.keras.optimizers.SGD(lr=5e-5, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(dataset,epochs=400)
In [72]:
forecast=[]
for time in range(len(series) - window_size):
    forecast.append(model.predict(series[time:time + window_size][np.newaxis]))

forecast = forecast[split_time-window_size:]
results = np.array(forecast)[:, 0, 0]


plt.figure(figsize=(10, 6))

plot_series(time_valid, x_valid)
plot_series(time_valid, results)
In [73]:
tf.keras.metrics.mean_absolute_error(x_valid, results).numpy()
Out[73]:
7.0815296
In [74]:
import matplotlib.image  as mpimg

#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
mae=history.history['mean_absolute_error']
loss=history.history['loss']

epochs=range(len(loss)) # Get number of epochs

#------------------------------------------------
# Plot MAE and Loss
#------------------------------------------------
plt.plot(epochs, mae, 'r')
plt.plot(epochs, loss, 'b')
plt.title('MAE and Loss')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend(["MAE", "Loss"])

plt.figure()

epochs_zoom = epochs[200:]
mae_zoom = mae[200:]
loss_zoom = loss[200:]

#------------------------------------------------
# Plot Zoomed MAE and Loss
#------------------------------------------------
plt.plot(epochs_zoom, mae_zoom, 'r')
plt.plot(epochs_zoom, loss_zoom, 'b')
plt.title('MAE and Loss')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend(["MAE", "Loss"])

plt.figure()
Out[74]:
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>

Using LSTM and Lambda Layers

Things can go very wrong with the wrong choice of parameters.

In [75]:
reset_graph()
tf.keras.backend.clear_session()
np.random.seed(51)

tf.keras.backend.clear_session()
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]),
  tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)),
  tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 100.0)
])

lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-8 * 10**(epoch / 20))

optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(dataset, epochs=100, callbacks=[lr_schedule])
In [76]:
plt.semilogx(history.history["lr"], history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 30])
Out[76]:
[1e-08, 0.0001, 0, 30]
In [77]:
reset_graph()
tf.keras.backend.clear_session()
np.random.seed(51)

tf.keras.backend.clear_session()
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]),
  tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)),
  tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 100.0)
])

model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-5, momentum=0.9),metrics=["mae"])
history = model.fit(dataset, epochs=100, verbose=0)
In [78]:
forecast = []
results = []
for time in range(len(series) - window_size):
    forecast.append(model.predict(series[time:time + window_size][np.newaxis]))

forecast = forecast[split_time-window_size:]
results = np.array(forecast)[:, 0, 0]

plt.figure(figsize=(10, 6))

plot_series(time_valid, x_valid)
plot_series(time_valid, results)
In [79]:
tf.keras.metrics.mean_absolute_error(x_valid, results).numpy()
Out[79]:
15.367542
In [80]:
#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
mae=history.history['mean_absolute_error']
loss=history.history['loss']

epochs=range(len(loss)) # Get number of epochs

#------------------------------------------------
# Plot MAE and Loss
#------------------------------------------------
plt.plot(epochs, mae, 'r')
plt.plot(epochs, loss, 'b')
plt.title('MAE and Loss')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend(["MAE", "Loss"])

plt.figure()

epochs_zoom = epochs[200:]
mae_zoom = mae[200:]
loss_zoom = loss[200:]

#------------------------------------------------
# Plot Zoomed MAE and Loss
#------------------------------------------------
plt.plot(epochs_zoom, mae_zoom, 'r')
plt.plot(epochs_zoom, loss_zoom, 'b')
plt.title('MAE and Loss')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend(["MAE", "Loss"])

plt.figure()

LSTM with CNN

In [81]:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    # modified for CNN
    series = tf.expand_dims(series, axis=-1)
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer)
    ds = ds.map(lambda w: (w[:-1], w[1:]))
    return ds.batch(batch_size).prefetch(1)

def model_forecast(model, series, window_size):
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size))
    ds = ds.batch(32).prefetch(1)
    forecast = model.predict(ds)
    return forecast
In [82]:
tf.keras.backend.clear_session()
np.random.seed(51)

window_size = 30
train_set = windowed_dataset(x_train, window_size, batch_size=128, shuffle_buffer=shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv1D(filters=32, kernel_size=5,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
  tf.keras.layers.Bidirectional(tf.keras.layers.CuDNNLSTM(32, return_sequences=True)),
  tf.keras.layers.Bidirectional(tf.keras.layers.CuDNNLSTM(32, return_sequences=True)),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 200)
])

lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-8 * 10**(epoch / 20))

optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(train_set, epochs=100, callbacks=[lr_schedule])
In [83]:
plt.semilogx(history.history["lr"], history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 30])
Out[83]:
[1e-08, 0.0001, 0, 30]
In [84]:
tf.keras.backend.clear_session()
np.random.seed(51)
#batch_size = 16

dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv1D(filters=32, kernel_size=3,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
  tf.keras.layers.CuDNNLSTM(32, return_sequences=True),
  tf.keras.layers.CuDNNLSTM(32, return_sequences=True),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 200)
])

optimizer = tf.keras.optimizers.SGD(lr=1e-5, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(dataset, epochs=500)
In [85]:
rnn_forecast = model_forecast(model, series[..., np.newaxis], window_size)
rnn_forecast = rnn_forecast[split_time - window_size:-1, -1, 0]
In [86]:
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, rnn_forecast)
In [87]:
tf.keras.metrics.mean_absolute_error(x_valid, rnn_forecast).numpy()
Out[87]:
5.9718175
In [88]:
#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
mae=history.history['mean_absolute_error']
loss=history.history['loss']

epochs=range(len(loss)) # Get number of epochs

#------------------------------------------------
# Plot MAE and Loss
#------------------------------------------------
plt.plot(epochs, mae, 'r')
plt.plot(epochs, loss, 'b')
plt.title('MAE and Loss')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend(["MAE", "Loss"])

plt.figure()

epochs_zoom = epochs[200:]
mae_zoom = mae[200:]
loss_zoom = loss[200:]

#------------------------------------------------
# Plot Zoomed MAE and Loss
#------------------------------------------------
plt.plot(epochs_zoom, mae_zoom, 'r')
plt.plot(epochs_zoom, loss_zoom, 'b')
plt.title('MAE and Loss')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend(["MAE", "Loss"])

plt.figure()
Out[88]:
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>

Predicting Sunspots Data

We do not attempt to achieve a good accuracy, this is just a demonstration of the method.

In [89]:
import csv
time_step = []
sunspots = []

with open('../data/sunspots.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    next(reader)
    for row in reader:
        sunspots.append(float(row[2]))
        time_step.append(int(row[0]))

series = np.array(sunspots)
time = np.array(time_step)
plt.figure(figsize=(10, 6))
plot_series(time, series)
In [90]:
series = np.array(sunspots)
time = np.array(time_step)
plt.figure(figsize=(10, 6))
plot_series(time, series)
In [91]:
split_time = 3000
time_train = time[:split_time]
X_train = series[:split_time]
time_valid = time[split_time:]
X_valid = series[split_time:]


mean = X_train.mean()
std = X_train.std()

x_train = (X_train - mean)/std
x_valid = (X_valid - mean)/std

window_size = 30
batch_size = 32
shuffle_buffer_size = 1000
In [92]:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    series = tf.expand_dims(series, axis=-1)
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer)
    ds = ds.map(lambda w: (w[:-1], w[1:]))
    return ds.batch(batch_size).prefetch(1)

def model_forecast(model, series, window_size):
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size))
    ds = ds.batch(32).prefetch(1)
    forecast = model.predict(ds)
    return forecast
In [93]:
tf.keras.backend.clear_session()
np.random.seed(51)

window_size = 64
batch_size = 256

train_set = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)


model = tf.keras.models.Sequential([
  tf.keras.layers.Conv1D(filters=32, kernel_size=5,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
  tf.keras.layers.CuDNNLSTM(64, return_sequences=True),
  tf.keras.layers.CuDNNLSTM(64, return_sequences=True),
  tf.keras.layers.Dense(30, activation="relu"),
  tf.keras.layers.Dense(10, activation="relu"),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 400)
])

lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-8 * 10**(epoch / 20))

optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(train_set, epochs=100, callbacks=[lr_schedule])
In [94]:
plt.semilogx(history.history["lr"], history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 60])
Out[94]:
[1e-08, 0.0001, 0, 60]
In [95]:
from tensorflow.keras.callbacks import EarlyStopping

tf.keras.backend.clear_session()
np.random.seed(51)

train_set = windowed_dataset(x_train, window_size=60, batch_size=100, shuffle_buffer=shuffle_buffer_size)

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv1D(filters=60, kernel_size=5,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
  tf.keras.layers.CuDNNLSTM(60, return_sequences=True),
  tf.keras.layers.CuDNNLSTM(60, return_sequences=True),
  tf.keras.layers.Dense(30, activation="relu"),
  tf.keras.layers.Dense(10, activation="relu"),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 400)
])

es = EarlyStopping(monitor='mean_absolute_error', mode='min', verbose=1, patience=30)

optimizer = tf.keras.optimizers.SGD(lr=1e-5, momentum=0.9)

model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(train_set, epochs=500, callbacks=[es])
In [96]:
rnn_forecast = model_forecast(model, series[..., np.newaxis], window_size)
rnn_forecast = rnn_forecast[split_time - window_size:-1, -1, 0]
In [97]:
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, rnn_forecast)
In [98]:
tf.keras.metrics.mean_absolute_error(x_valid, rnn_forecast).numpy()
Out[98]:
47.587708
In [99]:
#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
loss=history.history['loss']
epochs=range(len(loss)) # Get number of epochs

#------------------------------------------------
# Plot training and validation loss per epoch
#------------------------------------------------
plt.plot(epochs, loss, 'r')
plt.title('Training loss')
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend(["Loss"])

plt.figure()

zoomed_loss = loss[200:]
zoomed_epochs = range(200,500)

#------------------------------------------------
# Plot training and validation loss per epoch
#------------------------------------------------
plt.plot(zoomed_epochs, zoomed_loss, 'r')
plt.title('Training loss')
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend(["Loss"])

plt.figure()
Out[99]:
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>

Trying other set of parameters

In [100]:
reset_graph()
tf.keras.backend.clear_session()
np.random.seed(51)

window_size = 32
batch_size = 64

train_set = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)
print(train_set)
print(x_train.shape)

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv1D(filters=16, kernel_size=5,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
  tf.keras.layers.CuDNNLSTM(32, return_sequences=True),
  tf.keras.layers.CuDNNLSTM(32, return_sequences=True),
  tf.keras.layers.Dense(16, activation="relu"),
  tf.keras.layers.Dense(8, activation="relu"),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 400)
])

es = EarlyStopping(monitor='mean_absolute_error', mode='min', verbose=1, patience=20)

optimizer = tf.keras.optimizers.SGD(lr=1e-6, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

history = model.fit(train_set, epochs=400, callbacks=[es])
In [101]:
rnn_forecast = model_forecast(model, series[..., np.newaxis], window_size)
rnn_forecast = rnn_forecast[split_time - window_size:-1, -1, 0]
In [102]:
tf.keras.metrics.mean_absolute_error(x_valid, rnn_forecast).numpy()
Out[102]:
51.362106
In [103]:
tf.keras.metrics.mean_squared_error(x_valid, rnn_forecast).numpy()
Out[103]:
3056.7795

Overall RNN/CNN are very sensitive to the choice of hyperparameters. Most of the time the network will overfit on the training data set, and perform badly on the validation or test set. To achieve a better accuracy, one has to follow training and validation/test learning curves, and choose the optimal scenario.

Appendix: ARMA on Sunspots data

In [104]:
from scipy import stats
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
In [105]:
df = pd.read_csv('../data/sunspots.csv', usecols=["Date", "Monthly Mean Total Sunspot Number"],
                   parse_dates=True, infer_datetime_format=True)
df.rename(columns={"Monthly Mean Total Sunspot Number": "SN"}, inplace=True)

df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')

print("Data shape", df.shape)
df.plot(rot=30);
Data shape (3235, 1)
In [106]:
df.iloc[-400:].plot(rot=30);
In [107]:
split_time = 3000

x_train = df.iloc[:split_time]
x_valid = df.iloc[split_time:]

Naive forecast as a first benchmark

In [108]:
naive_forecast = df.iloc[split_time - 1:-1]
In [109]:
time_valid = naive_forecast.index

plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid, label="validation set")
plot_series(time_valid, naive_forecast, label="naive forecast")

Zoomed version

In [110]:
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid, start=0, end=80, label="validation set")
plot_series(time_valid, naive_forecast, start=1, end=81, label="naive forecast")
In [111]:
print(keras.metrics.mean_squared_error(x_valid.values.ravel(), naive_forecast.values.ravel()).numpy())
print(keras.metrics.mean_absolute_error(x_valid.values.ravel(), naive_forecast.values.ravel()).numpy())
523.620255319149
16.136170212765958

Moving average forecast on differenced time series as a second benchmark

In [112]:
def moving_average_forecast(series, window_size):
    """Forecasts the mean of the last few values.
     If window_size=1, then this is equivalent to naive forecast"""
    forecast = []
    for time in range(len(series) - window_size):
        forecast.append(series[time:time + window_size].mean())
    return np.array(forecast)
In [113]:
shift = 12

diff_series = df.diff(shift).dropna()
diff_series.plot()
Out[113]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a2404cbbe0>
In [114]:
window_size = 12

diff_moving_avg = moving_average_forecast(diff_series, window_size)[split_time - window_size - shift:]

plt.figure(figsize=(10, 6))
plot_series(time_valid, diff_series[split_time - shift:], label="validation set")
plot_series(time_valid, diff_moving_avg, label="diff MA")
plt.show()

Now let's bring back the trend and seasonality by adding the past values from t - shift. Also, we shood smooth both past and present values, to remove some of the noise.

In [115]:
smooth_window = 10

diff_moving_avg_plus_smooth_past = moving_average_forecast(df[split_time - shift - smooth_window //2
                                                              : -shift + smooth_window // 2],
                                                           smooth_window) + diff_moving_avg

plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid, label="validation set")
plot_series(time_valid, diff_moving_avg_plus_smooth_past, label="smoothed diff MA")
plt.show()
In [116]:
print(keras.metrics.mean_squared_error(x_valid.values.ravel(), diff_moving_avg_plus_smooth_past.ravel()).numpy())
print(keras.metrics.mean_absolute_error(x_valid.values.ravel(), diff_moving_avg_plus_smooth_past.ravel()).numpy())
742.5168231796689
19.78877304964539

Playing with parameters to improve this result

In [117]:
shifts = range(1, 12, 3)
window_sizes = range(2, 13, 3)
smooth_windows = range(8, 12, 4)

# N.B. We need shift > smooth_window // 2 

shift_best, window_size_best, smooth_window_best, min_MAE = 0, 0, 0, np.inf

for shift in shifts:
    for window_size in window_sizes:
        for smooth_window in smooth_windows:
            if shift > smooth_window // 2:
                diff_series = df.diff(shift).dropna()
                diff_moving_avg = moving_average_forecast(diff_series, window_size)[split_time - window_size - shift:]
                cdf = df[split_time-shift-smooth_window//2:-shift+smooth_window//2] # centered MA of past series
                past_forecast = moving_average_forecast(cdf, smooth_window)
                diff_moving_avg_plus_smooth_past = past_forecast + diff_moving_avg
                print(f"For shift={shift}, window_size={window_size}, shooth_window={smooth_window}")

                mae = keras.metrics.mean_absolute_error(x_valid.values.ravel(), diff_moving_avg_plus_smooth_past.ravel()).numpy()

                if mae < min_MAE:
                    min_MAE = mae
                    shift_best, window_size_best, smooth_window_best = shift, window_size, smooth_window

                print("MSE", keras.metrics.mean_squared_error(x_valid.values.ravel(), diff_moving_avg_plus_smooth_past.ravel()).numpy())
                print("MAE", mae, end="\n\n")

print("Best values are:", shift_best, window_size_best, smooth_window_best, min_MAE)
For shift=7, window_size=2, shooth_window=8
MSE 811.0970365691489
MAE 20.25494680851064

For shift=7, window_size=5, shooth_window=8
MSE 650.9805821010638
MAE 17.962521276595744

For shift=7, window_size=8, shooth_window=8
MSE 665.3434441489361
MAE 18.11702127659574

For shift=7, window_size=11, shooth_window=8
MSE 644.7816629099261
MAE 18.082877176015472

For shift=10, window_size=2, shooth_window=8
MSE 633.6567041223403
MAE 18.137287234042553

For shift=10, window_size=5, shooth_window=8
MSE 527.1301196542553
MAE 15.901138297872341

For shift=10, window_size=8, shooth_window=8
MSE 602.6832499999999
MAE 17.704893617021273

For shift=10, window_size=11, shooth_window=8
MSE 676.7375347448128
MAE 18.863331721470015

Best values are: 10 5 8 15.901138297872341
In [118]:
shift, window_size, smooth_window = shift_best, window_size_best, smooth_window_best

diff_series = df.diff(shift).dropna()
diff_moving_avg = moving_average_forecast(diff_series, window_size)[split_time - window_size - shift:]
cdf = df[split_time-shift-smooth_window//2:-shift+smooth_window//2] # centered MA of past series
past_forecast = moving_average_forecast(cdf, smooth_window)
diff_moving_avg_plus_smooth_past = past_forecast + diff_moving_avg
print(f"For shift={shift}, window_size={window_size}, shooth_window={smooth_window}")

mae = keras.metrics.mean_absolute_error(x_valid.values.ravel(), diff_moving_avg_plus_smooth_past.ravel()).numpy()
print("MSE", keras.metrics.mean_squared_error(x_valid.values.ravel(), diff_moving_avg_plus_smooth_past.ravel()).numpy())
print("MAE", mae, end="\n\n")
For shift=10, window_size=5, shooth_window=8
MSE 527.1301196542553
MAE 15.901138297872341

Trend-Seasonal-Noise decomposition

In [119]:
decomposition = sm.tsa.seasonal_decompose(df.SN, freq=11*12)
fig = decomposition.plot();
In [120]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(decomposition.resid, lags=60, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(df, lags=60, ax=ax2)

AC/PAC on the original data

In [121]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(df.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(df, lags=40, ax=ax2)
In [122]:
aic = pd.DataFrame(np.zeros((6,6), dtype=float))

warnings.simplefilter('ignore')

# Iterate over all ARMA(p,q) models with p,q in [0,6]
best_p, best_q, min_aic = 0, 0, np.inf
for p in range(6):
    for q in range(6):
        if p == 0 and q == 0:
            continue
        mod = sm.tsa.statespace.SARIMAX(x_train, order=(p,0,q), seasonal_order=(1,1,0,12),
                                        simple_differencing=True, enforce_invertibility=False)
        try:
            res = mod.fit(disp=False)
            AIC = res.aic
            aic.iloc[p,q] = AIC
            if AIC < min_aic:
                best_p, best_q, min_aic = p, q, AIC
        except:
            aic.iloc[p,q] = np.nan

print(f"Best (p, q)=({best_p}, {best_q})")
Best (p, q)=(4, 5)
In [123]:
aic
Out[123]:
0 1 2 3 4 5
0 0.000000 30750.647457 30210.212235 29963.262604 29795.734448 29614.417172
1 29405.519284 29043.501475 29004.937546 29006.801801 28997.364480 28995.430752
2 29199.071081 29009.203938 29006.872796 29007.648278 28941.325448 28965.091937
3 29091.612846 29003.819405 28964.590912 29005.157960 28923.585496 28914.495604
4 29022.810980 28998.106799 28988.502070 28921.849884 28905.376303 28828.720720
5 29004.775143 28999.252070 28915.868625 28853.309736 28923.502301 28908.068575
In [124]:
arma_mod = sm.tsa.ARMA(x_train, (best_p, best_q)).fit(disp=False)
print(arma_mod.summary())
                              ARMA Model Results
==============================================================================
Dep. Variable:                     SN   No. Observations:                 3000
Model:                     ARMA(4, 5)   Log Likelihood              -13898.577
Method:                       css-mle   S.D. of innovations             24.864
Date:                Thu, 07 Nov 2019   AIC                          27819.154
Time:                        17:08:10   BIC                          27885.224
Sample:                    01-31-1749   HQIC                         27842.919
                         - 12-31-1998
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         83.5199      4.403     18.970      0.000      74.891      92.149
ar.L1.SN       0.6159      0.042     14.698      0.000       0.534       0.698
ar.L2.SN       0.8238      0.084      9.831      0.000       0.660       0.988
ar.L3.SN       0.4219      0.086      4.895      0.000       0.253       0.591
ar.L4.SN      -0.8694      0.043    -20.357      0.000      -0.953      -0.786
ma.L1.SN      -0.0958      0.045     -2.112      0.035      -0.185      -0.007
ma.L2.SN      -0.8142      0.063    -12.971      0.000      -0.937      -0.691
ma.L3.SN      -0.7435      0.053    -14.161      0.000      -0.846      -0.641
ma.L4.SN       0.5539      0.019     28.470      0.000       0.516       0.592
ma.L5.SN       0.1749      0.021      8.277      0.000       0.133       0.216
                                    Roots
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1           -0.7666           -0.7340j            1.0614           -0.3785
AR.2           -0.7666           +0.7340j            1.0614            0.3785
AR.3            1.0093           -0.0486j            1.0105           -0.0077
AR.4            1.0093           +0.0486j            1.0105            0.0077
MA.1           -0.7497           -0.7209j            1.0400           -0.3781
MA.2           -0.7497           +0.7209j            1.0400            0.3781
MA.3            1.1530           -0.0276j            1.1533           -0.0038
MA.4            1.1530           +0.0276j            1.1533            0.0038
MA.5           -3.9740           -0.0000j            3.9740           -0.5000
-----------------------------------------------------------------------------
In [125]:
fig, ax = plt.subplots(figsize=(10,8))
fig = arma_mod.plot_predict(start=2500, end=x_train.shape[0]+20, dynamic=True, ax=ax, plot_insample=True)
legend = ax.legend(loc='upper left')
In [ ]:
mod = sm.tsa.statespace.SARIMAX(x_train,
                                order=(best_p, 0, best_q),
                                seasonal_order=(1, 1, 0, 11*12),
                                simple_differencing=True, enforce_invertibility=False
                                #enforce_stationarity=False,
                                #enforce_invertibility=False
                               )
results = mod.fit()
print(results.summary().tables[1])

results.plot_diagnostics(figsize=(16, 8))
plt.show()

pred = results.get_prediction(start=pd.to_datetime('1999-01-31'), dynamic=False)
pred_ci = pred.conf_int()
ax = df.SN.plot(label='observed')

pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7))
#ax.fill_between(pred_ci.index, pred_ci.values[0][0], pred_ci.values[0][1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend(loc="best")
plt.show()

Does our model obey the theory?

In [130]:
sm.stats.durbin_watson(arma_mod.resid)
Out[130]:
1.9766149750194713
In [131]:
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax = arma_mod.resid.plot();
In [132]:
resid = arma_mod.resid
stats.normaltest(resid)
Out[132]:
NormaltestResult(statistic=381.66911291741155, pvalue=1.3231380948913631e-83)
In [133]:
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid, line='q', ax=ax, fit=True)
In [134]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=40, ax=ax2)
In [135]:
r, q, p = sm.tsa.acf(resid.squeeze(), fft=True, qstat=True)
data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print(table.set_index('lag'))
            AC          Q  Prob(>Q)
lag
1.0   0.011621   0.405532  0.524246
2.0   0.044856   6.449827  0.039759
3.0  -0.012934   6.952541  0.073426
4.0  -0.036822  11.028166  0.026249
5.0  -0.019880  12.216634  0.031937
6.0  -0.005251  12.299562  0.055610
7.0  -0.065839  25.343113  0.000660
8.0  -0.017411  26.255617  0.000950
9.0   0.029087  28.803180  0.000699
10.0  0.028513  31.252023  0.000533
11.0  0.030135  33.988177  0.000363
12.0  0.010804  34.339992  0.000596
13.0  0.010609  34.679345  0.000948
14.0  0.023834  36.392614  0.000911
15.0  0.040230  41.275661  0.000290
16.0  0.010846  41.630668  0.000448
17.0  0.001227  41.635210  0.000757
18.0 -0.037532  45.889608  0.000308
19.0 -0.009998  46.191598  0.000466
20.0 -0.015605  46.927511  0.000600
21.0 -0.029210  49.506942  0.000427
22.0 -0.028035  51.883909  0.000324
23.0  0.038092  56.273478  0.000130
24.0 -0.020476  57.542239  0.000141
25.0  0.047019  64.234900  0.000027
26.0  0.043705  70.019185  0.000007
27.0 -0.012013  70.456308  0.000010
28.0 -0.016127  71.244442  0.000012
29.0 -0.016260  72.045901  0.000016
30.0  0.008806  72.281068  0.000024
31.0 -0.013301  72.817695  0.000032
32.0  0.051784  80.954551  0.000004
33.0 -0.005060  81.032278  0.000006
34.0 -0.005119  81.111832  0.000010
35.0 -0.011940  81.544865  0.000014
36.0 -0.011495  81.946355  0.000019
37.0 -0.005340  82.033020  0.000029
38.0 -0.000388  82.033477  0.000045
39.0 -0.022346  83.552256  0.000044
40.0  0.020393  84.817573  0.000046
In [136]:
x_train.index
Out[136]:
DatetimeIndex(['1749-01-31', '1749-02-28', '1749-03-31', '1749-04-30',
               '1749-05-31', '1749-06-30', '1749-07-31', '1749-08-31',
               '1749-09-30', '1749-10-31',
               ...
               '1998-03-31', '1998-04-30', '1998-05-31', '1998-06-30',
               '1998-07-31', '1998-08-31', '1998-09-30', '1998-10-31',
               '1998-11-30', '1998-12-31'],
              dtype='datetime64[ns]', name='Date', length=3000, freq=None)
In [137]:
x_valid.index
Out[137]:
DatetimeIndex(['1999-01-31', '1999-02-28', '1999-03-31', '1999-04-30',
               '1999-05-31', '1999-06-30', '1999-07-31', '1999-08-31',
               '1999-09-30', '1999-10-31',
               ...
               '2017-10-31', '2017-11-30', '2017-12-31', '2018-01-31',
               '2018-02-28', '2018-03-31', '2018-04-30', '2018-05-31',
               '2018-06-30', '2018-07-31'],
              dtype='datetime64[ns]', name='Date', length=235, freq=None)
In [138]:
predict_sunspots = arma_mod.predict('1999-01-31', '2018-07-31', dynamic=True)
In [139]:
fig, ax = plt.subplots(figsize=(12, 8))
fig = arma_mod.plot_predict('1991-01-31', '2018-07-31', dynamic=True, ax=ax, plot_insample=True)
In [140]:
def mean_absolute_forecast_err(y, yhat):
    return np.abs(y.values - yhat.values.reshape(-1, 1)).mean(axis=0)[0]
In [141]:
mean_absolute_forecast_err(x_valid, predict_sunspots)
Out[141]:
36.82632027259433
In [142]:
# Statespace
mod = sm.tsa.statespace.SARIMAX(x_train, order=(best_p,0,best_q))
res = mod.fit(disp=False)
print(res.summary())
                           Statespace Model Results
==============================================================================
Dep. Variable:                     SN   No. Observations:                 3000
Model:               SARIMAX(4, 0, 5)   Log Likelihood              -13937.657
Date:                Thu, 07 Nov 2019   AIC                          27895.314
Time:                        17:22:51   BIC                          27955.378
Sample:                    01-31-1749   HQIC                         27916.919
                         - 12-31-1998
Covariance Type:                  opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.9802      2.493      0.393      0.694      -3.907       5.867
ar.L2          0.9623      4.930      0.195      0.845      -8.701      10.625
ar.L3         -0.9732      2.533     -0.384      0.701      -5.938       3.991
ar.L4          0.0288      0.197      0.146      0.884      -0.357       0.415
ma.L1         -0.4192      2.493     -0.168      0.866      -5.305       4.466
ma.L2         -1.0996      3.531     -0.311      0.755      -8.020       5.821
ma.L3          0.3808      0.817      0.466      0.641      -1.220       1.981
ma.L4          0.1493      0.195      0.768      0.443      -0.232       0.531
ma.L5          0.0772      0.194      0.399      0.690      -0.302       0.457
sigma2       633.9674     10.538     60.163      0.000     613.314     654.621
===================================================================================
Ljung-Box (Q):                       91.30   Jarque-Bera (JB):              1164.39
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               1.11   Skew:                             0.33
Prob(H) (two-sided):                  0.11   Kurtosis:                         5.98
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
In [143]:
# In-sample one-step-ahead predictions, and out-of-sample forecasts
nforecast = len(x_valid)

predict = res.get_prediction(end=mod.nobs + nforecast)
idx = np.arange(len(predict.predicted_mean))
predict_ci = predict.conf_int(alpha=0.5)

# # Graph
fig, ax = plt.subplots(figsize=(12,6))
ax.xaxis.grid()
#ax.plot(df, 'k.')
ax.plot(idx[-nforecast:], x_valid[-nforecast:], 'g--', linestyle='-', linewidth=1.3)

# Plot
ax.plot(idx[:-nforecast], predict.predicted_mean[:-nforecast], 'gray', linewidth=1.5)
ax.plot(idx[-nforecast:], predict.predicted_mean[-nforecast:], 'k--', linestyle='--', linewidth=1.7)
ax.fill_between(idx, predict_ci.values[:, 0], predict_ci.values[:, 1], alpha=0.5);
In [144]:
mod.nobs, x_valid.SN.shape
Out[144]:
(3000, (235,))
In [145]:
predict_sunspots = res.predict(end=mod.nobs)
mean_absolute_forecast_err(x_train, predict_sunspots[:-1])
Out[145]:
17.940801919111408
In [146]:
predict_sunspots = res.predict(start=mod.nobs, end=mod.nobs + nforecast)
mean_absolute_forecast_err(x_valid, predict_sunspots[:-1])
Out[146]:
40.56498736960781

ARMA: Artificial data

In [147]:
from statsmodels.tsa.arima_process import arma_generate_sample
In [148]:
np.random.seed(1234)
# include zero-th lag

arparams = np.array([.75, -.25])
maparams = np.array([.65, .35])

The conventions of the arma_generate function require that we specify a 1 for the zero-lag of the AR and MA parameters and that the AR parameters be negated.

In [149]:
arparams = np.r_[1, -arparams]
maparams = np.r_[1, maparams]
nobs = 250
y = arma_generate_sample(arparams, maparams, nobs)

Now, optionally, we can add some dates information. For this example, we will use a pandas time series.

In [150]:
dates = sm.tsa.datetools.dates_from_range('1980m1', length=nobs)
y = pd.Series(y, index=dates)
arma_mod = sm.tsa.ARMA(y, order=(2,2))
arma_res = arma_mod.fit(trend='nc', disp=-1)
In [151]:
print(arma_res.summary())
                              ARMA Model Results
==============================================================================
Dep. Variable:                      y   No. Observations:                  250
Model:                     ARMA(2, 2)   Log Likelihood                -349.537
Method:                       css-mle   S.D. of innovations              0.975
Date:                Thu, 07 Nov 2019   AIC                            709.073
Time:                        17:23:00   BIC                            726.680
Sample:                    01-31-1980   HQIC                           716.160
                         - 10-31-2000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1.y        0.8119      0.143      5.687      0.000       0.532       1.092
ar.L2.y       -0.3481      0.109     -3.195      0.002      -0.562      -0.135
ma.L1.y        0.5194      0.141      3.691      0.000       0.244       0.795
ma.L2.y        0.3492      0.106      3.291      0.001       0.141       0.557
                                    Roots
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.1663           -1.2299j            1.6950           -0.1292
AR.2            1.1663           +1.2299j            1.6950            0.1292
MA.1           -0.7437           -1.5200j            1.6922           -0.3224
MA.2           -0.7437           +1.5200j            1.6922            0.3224
-----------------------------------------------------------------------------
In [152]:
fig, ax = plt.subplots(figsize=(10,8))
fig = arma_res.plot_predict(start='1999-06-30', end='2001-05-31', ax=ax)
legend = ax.legend(loc='upper left')

Convert jupyter notebook into the html file with in a given format

In [ ]:
notebook_file = r"TS5_Deep_Neural_Networks_for_Time_Series.ipynb"
html_converter(notebook_file)